Building Decision Trees on Records Linked through Key References

نویسندگان

  • Ke Wang
  • Yabo Xu
  • Philip S. Yu
  • Rong She
چکیده

We consider the classification problem where the data is given by a collection of tables related by a hierarchical structure of key references and class labels contained in the root table. Each parent table represents a many-to-many relationship type among its child tables. Such data are frequently found in relational databases, data warehouses, XML data, and biological databases. One solution is joining all tables into a universal table based on the recorded relationships, but it suffers from a significant blowup caused by many-to-many relationships. Another solution is treating the problem as relational learning, at the cost of increased complexity and degraded performance. We propose a novel method that builds exactly the same decision tree classifier as built from the joined table, but not the blowup required in the traditional approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution

This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...

متن کامل

Improving Classifications of Medical Data Based on Fuzzy ART2 Decision Trees

Analyzing given medical databases provide valuable references for classifying other patients symptoms. This study presents a strategy for discovering fuzzy decision trees from medical databases, in particular Harbeman’s Survival database and the Blood Transfusion Service Center database. Harbeman’s Survival database helps doctors treat and diagnose a group of patients who show similar past medi...

متن کامل

Knowledge Discovery through SysFor - a Systematically Developed Forest of Multiple Decision Trees

Decision tree based classification algorithms like C4.5 and Explore build a single tree from a data set. The two main purposes of building a decision tree are to extract various patterns/logic-rules existing in a data set, and to predict the class attribute value of an unlabeled record. Sometimes a set of decision trees, rather than just a single tree, is also generated from a data set. A set o...

متن کامل

A Survey of Information Theory Application on Data Mining

In data mining area, "classification" is one of the most important isses. The approach of decision trees generated is a very useful and reliable solution. For the construction of a decision tree, there are several ways. Among them, Information Theory is a very effective and scalable method. This is a survey project in Information Theory. We focus on the generation of decision tree for classific...

متن کامل

Feature Generation using Ontologies during Induction of Decision Trees on Linked Data

Linked data has the potential of interconnecting data from different domains, bringing new potentials to machine agents to provide better services for web users. The ever increasing amount of linked data in government open data, social linked data, linked medical and patients’ data provides new opportunities for data mining and machine learning. Both are however strongly dependent on the select...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005